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Simple statistical procedures for analyzing error data, e.g., in digital 
data transmission systems, are usually based on the assumption of in- 
dependence. This paper studies the performance and potential utility of 
such simple statistical procedures in the case of nonindependent error 
occurrences. The burst noise model is selected for this purpose because of 
its neatness, its mathematical tractability, its built-in structure of de- 
pendence, and its importance in communication theory. We show that 
statistical procedures designed under the assumption of independence tend 
to be conservative for the burst noise model. For example, the usual bi- 
nomial test will reject, on the average, more channels with small error 
rates than it would if the errors were independent. The case that the sample 
size n and the error rate p converge in such a way that np — > no is also 
studied. It is shown that the error process can be approximated by a 
compound Poisson process in continuous time t. The statistical implica- 
tions of this fact are also discussed. 

I. INTRODUCTION 

A dilemma long existing in the theory and applications of digital 
data transmission is the precise determination of the error structure. 
On the one hand, it is a well-recognized fact that errors do not occur 
independently; on the other hand, only the assumption of indepen- 
dence offers us a model sufficiently tractable that ordinary statistical 
procedures can be designed accordingly. A direct consequence is, 
of course, that we are using statistical methods designed for in- 
dependent observations to make statistical inferences on dependent 
data. 

The fact is, we do not have much knowledge of the error structure 
of data transmission channels. Mathematical models have been 
constructed for fitting observed data streams containing errors, 
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noticeably the burst noise model of Gilbert, 1 the Markov error process 
and renewal error process of Elliott, 2,3 and the binary regenerative 
model of McCullough. 4 

One of the most pertinent models with a built-in dependence 
structure is Gilbert's burst noise model. It is this model that we shall 
study in this paper. One of the prime concerns of this study is the 
behavior of various statistical procedures under the burst noise model. 

Gilbert 1 constructs a model for burst noise as follows. An input 
binary signal (0 or 1) is transmitted through a noisy channel with 
noise z (0 or 1) so that the output is given by 

output = input + z (mod 2). 

The channel can be in either of the two states, good (G) or bad (B). 
If, at time n, the channel is in G, there is no noise so z n = 0; if the 
channel is in B, a "coin" with P[head] = h is tossed and z n = 1 is 
identified with a tail outcome. 

The channel can shift from a good state to a bad state and vice 
versa. Identify 1 as G and 2 as B and let X n denote that state of the 
channel at time n. It is assumed that the process {X n : n ^ 1} is a 
two-state Markov chain with stationary transition probabilities 



T = 



1 - P P 

V i - v\ 



(1) 



and initial distribution (tti, t-i). 

Let Z n = z\ + • • • + z n denote the number of errors through the 
nth-bit output (0 or 1) digits of the channel where «,- = 1 if and only 
if an error occurs at the ith. bit. The statistic Z n is obviously the 
quantity that will be used in any statistical procedure concerning the 
bit error rate. The statistical behavior of Z„ will be studied extensively 
in this work. 

In Section II, we derive most of the exact formulas concerning Z„, 
including explicit expressions for its probability-generating function 
and its first and second moments. The exact form of the probability 
distribution of Z n is quite involved in general. For the special case 
p + P = 1, Z n reduces to the binomial variable. The quantity X 
= 1 — p — P can thus be used as a measure of dependence ; most of 
the complications in this work are caused by the presence of a nonzero 
X. The effect of dependence is discussed in some detail in Section III. 
Transmission in blocks of digits is considered; one of our major results 
is that it can be shown in this model that the block error and the bit 
error have essentially the same covariance structure. Thus, most 
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results concerning bit error rate can be transferred easily to results 
about block error rate. As a corollary, the variance for Z n is obtained 
as a sum of two components, one due to the sum of variances (as if 
the z's were independent) and the other due to the fact that X ^ 
(the effect of dependence). 

Since Z n is known to be asymptotically normally distributed, the 
variance formula of Z n can be used to judge the effect of dependence 
on the robustness of statistical procedures (i.e., on how well procedures 
based on the independence assumption perform if this assumption is 
violated). A general conclusion of Section IV is that statistical pro- 
cedures designed under the assumption of independence tend to be 
conservative for the burst noise model. For example, the usual bi- 
nomial test will reject, on the average, more channels with small 
error rate then it is supposed to. 

It is shown in Section V that if the bit error rate p — * in such a 
way that np — ► p. > 0, then Z n converges in distribution to a com- 
pound Poisson distribution. The statistical implications of this fact 
are also discussed. In particular, Z„ is a minimal sufficient statistic for 
Mo(p) in some approximate sense. This justifies the use of Z„ in any 
statistical decision procedures concerning the error rate p. 

Despite the model's simplicity, the insight we gained in studying 
this burst noise model enables us to investigate more deeply the 
structure of error processes. For example, it is possible to treat the 
underlying Markov chain jl„| as an s-state stationary Markov chain. 
Details of this and other extensions and their implications will be 
discussed in a forthcoming report. 

II. STATISTICAL PROPERTIES OF Z„ 

We shall assume, for simplicity and without loss of too much 
generality, in the sequel that the initial distribution (n, 7r 2 ) of the 
two-state Markov chain |A" n | agrees with its absolute stationary 
distribution (p/(p + P), P, (p + P)). Under this assumption, |I n | 
is strictly stationary. 

Let 

g n = P[Z n = 0]. 

Note that the bit error rate p is given by 

Pi = 1 — pi 
= P[Zi * 0] 
= P[n = 1] 
= (1 -h)P (p + P); (2) 



1306 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1973 

and the block error rate p k , the probability that a block of size k 
contains at least one error, is 

p k = 1 - 9k- (3) 

Thus, pi = p. 

Since the event [>,- = 1] implies [X,- = 2] and thus signifies a 
return to a bad state (a recurrent event), it is possible to utilize the 
renewal equation to derive an exact expression for g„. The following 
theorem is essentially due to Gilbert [Ref. 1, eq. (14)]. 



Theorem 1: For n ^ 1, 

Qn = 

where 



A^ A^ t (4) 

1 — ai \.—a.2 



«i = K- (i -'0(i -v) - (p + p-2) 

+ V[(l - P) - Ml - v)J + 4pPfc] 
« 2 = §[- (1 -h)(l - p) - (p + P - 2) 

- V[(l - P) - h(l - V )J + 4pPfc] 
i4i = p[ai + (p + P ~ l)]/«i(ai - «*) 
A 2 = pC«2 + (p + P — l)]/a 2 (a2 — ai)- 

A proof of Theorem 1 different from that of Gilbert (and the proofs 
of all other theorems) will be presented in the appendix. We remark 
here that since a broader view and a more systematic approach is 
adopted in our new proof, it is possible to extend our method readily to 
a more general framework than a two-state Markov chain. 

Relation (4) can be viewed as a relation between bit error rate and 
block error rate. If X = 1 - p - P > 0, it can be shown that < a 2 
< oi < 1 so that g„ — »0 exponentially fast. One effect of dependence 
in this model is reflected in (4), namely that g„ is the sum of two 
exponential terms instead of one. In general, if the underlying Markov 
chain is s-state, g n will be a sum of s exponential terms. 

The right-hand side of (4) is a function of p, P, and h. We shall 
write g n = g n (p, P, h) when we want to emphasize this point. An 
important connection between g n and Eu z ", the probability-generating 
function (PGF) of Z„, is stated in Theorem 2. 

Theorem 2: The 'probability-generating function of Z n is given by 

Eu z » = g n (p,P,H), (5) 

where 

H = (1 - h)u + h. 
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Thus, replacing each h by (1 - h)u + h in (4), we obtain the PGF 
of Z n . The exact expressions for P\_Z n = i] are involved unless i is small. 
Using (4) and the fact that < a% < a x < 1, it is possible to express 
P[Z n = i] approximately in terms of its leading term as 



n*- «^(*)«H. 



(6) 



Relation (6) can be used to establish the Poisson convergence of Z n 
if p = no/n-^0. However, an indirect proof will be presented later. 
Moments of Z n can be obtained by differentiating the right-hand 
side of (5) and setting u = 1 . Specifically, we have 

EZ n = n P , (7) 



(n - 1)X _ X 2 (l - X"- 1 ) 1 
1 - X (1 - X) 2 J' 



(8) 



VarZ„ = np{\ - p) + 2C 
where 

C = (1 — h) 2 TTiTT2 

X = 1 - p - p. 

Relation (8) also can be obtained by other methods which we shall 
discuss in Section III. 

III. MEASURE OF DEPENDENCE AND ITS EFFECT 

If the transition matrix of a Markov chain has identical rows, then 
this Markov chain is merely a sequence of independent and identically 
distributed (iid) random variables. For the two-state Markov chain 
{X n } underlying this burst noise model, the matrix T in (1) has 
identical rows if and only if p + P = 1. Letting X = 1 — p — P, we 
see that |X| ^ 1 and that X = if and only if the channel is 
memoryless. 

The eigenvalues of the transition matrix play important roles in 
the theory of Markov chains. The largest (in absolute value) eigen- 
value is always 1 ; in general, it is the second largest eigenvalue that 
affects all the essential features of a Markov chain. The parameter X 
defined earlier is the second largest eigenvalue of the matrix T in (1). 

The significance of the parameter X can be interpreted intuitively. 
If p and P are small, the underlying Markov chain {X n } tends to 
stay in a certain state (G or B) once it enters this state; hence, X > 
indicates the tendency of producing bursty errors. If both p and P 
are large, then {X„\ tends to shift between the good state and bad 
state alternatively. Since the latter case is obviously not very interest- 
ing, we shall always assume X ^ in the sequel. 
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Let II denote the 2 X 2 matrix with identical rows 



n = 



7Tl 7r 2 J' 



where (in, tt 2 ) is the absolute stationary distribution of {X n }. By the 
definition of the absolute stationary distribution and by some simple 
calculations, it can be seen that 

tit = rn = n 2 = n. (9) 

It follows from (9) and simple induction that, for n ^ 1, 

T n - n = (T - n) n 



= X" T2 '" . (10) 

_ 7Tl 7T1 J 

Relation (10) allows us to calculate the ^-step transition proba- 
bilities of (I„) accurately. It can also be used to find the covariance 
of %i and Zj. We restate eq. (17) of Ref. 1 as follows: 

Theorem 3: The covariance of z it Zj (i ^ j) is given by 

Cov{z iy z 3 ) = CXi'-''i, (11) 

where C = (1 — h) 2 ir\T2. 

Corollary: 

Var <Z.) = np(l -,) + *?[ ^^ - ^ I ffi ] ■ (12) 

Define, for i = 1, 2, • • •, 

Tt = 1 if z {i -i )k+ i + 2(,-i)*+2 H r- z.-fc ^ 1 

= otherwise; (13) 

namely, 7\ = 1 if and only if the ith. block of length k is not error-free. 
It is possible to extend eq. (11), and therefore (12), to the correspond- 
ing equations involving the T's. 

Theorem 4: There exists < Ci < °° such that 

Cov(Ti,Ti) = C,\"-"*. (14) 

The value of Ci can be found explicitly. However, we shall be satisfied 
with a crude estimate d = C 2 tt i7r 2 X 1_i where | C»| ^ \. 

Note that T, = z, if fc = 1. In this case, eq. (14) reduces to (11). 
Theorem 4 not only states that the T's are "less dependent" than the 
z's but it also tells us, in some sense, how much less dependent the 



BURST NOISE 



1309 



T's are. Let 



S n = T x + T 2 + • • • + T, 



The statistic S n is the obvious statistical quantity to analyze if digits 
are transmitted in blocks of size k. For example, in the 1969-70 
Connection Survey 5 - 6 on the Bell System Switched Telecommuni- 
cations Network conducted by Bell Laboratories, statistics of block 
errors are presented for both high-speed and low-speed data trans- 
mission. Hence, the more important implication of Theorem 4 is that 
eq. (14) exhibits the same general structure as eq. (11). For example, 
replacing C by &, p by p k , and X by X* = X* in (12), we immediately 
obtain the formula for Var (S n ). 



Corollary: 

Var (S n ) = np*(l - Pk ) + 2Ci 



(n - 1)X* X* 2 (l ~ X*"- 1 ) 



1 - X" 



(1 - X*)' 



(15) 



Consequently, statistical procedures using S„ and concerning the 
inferences on the block error rate p k should have essentially the same 
behavior as those procedures using Z n and concerning the bit error 
rate p. The above reasoning implies that, at least as far as the large 
sample properties are concerned, it is sufficient to consider inference 
on p only. 

Both the law of large numbers and the central limit theorem hold 
true for the sum of Markovian random variables; see, for example, 
Ref. 7. Hence, 

— " -> Pk (16) 



with probability 1 ; and 



S„ — npk 
L War S n 



< v 



*(«) 



(17) 



for each — cc < v < cc, where $(v) denotes the cumulative distri- 
bution function of an N(0, 1) random variable. Relations (16) and 
(17) will be used in Section IV to discuss the robustness of some 
statistical procedures concerning inferences on p k . 

IV. STATISTICAL INFERENCES ON p k 

For simplicity, we shall consider the special case k = 1 and concen- 
trate our discussion on problems of statistical estimation and hy- 
pothesis testing of p = p\. As remarked earlier in Section III, the 
restriction k = 1 can easily be extended to the general case. 
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Since {X n \ is assumed to be stationary, so is {z„} ; we have seen that 

EZZnl = np, (18) 

so that the obvious estimator p„ = Z n /n of p is unbiased. Relation 
(16), specialized for the case k = 1, states that p„ is a strongly con- 
sistent estimator of p. 

Very few (optimal) small sample properties of p„ can be stated, 
however. For n ^ 3, it can be shown that no uniformly minimum 
variance estimator of p exists. Nevertheless, it is intuitively obvious 
that p„ is about the best we can do if the z's are the only observables. 
From (12), 

n Var (/J.) = p(l - p) + 2C =-^ + o(l) (19) 



1 - X 



A a* + A, 



where 



<r 2 = P (l - p) 

A = 2(1 - /l) 2 7Tl7r 2 \/(l - X). 



Note that the term p(l - p) in eq. (19) corresponds to n Var (p„) if 
the z's were independent. Since we have assumed that X ^ 0, it follows 
that A ^ and n Var (p„) ^ <r 2 . Thus, the presence of a positive X 
actually causes loss of efficiency in estimating p. Writing A = t 2 , 
we see that if the parameters h, p, and P (hence o- 2 and r 2 ) can be 
estimated from the data, the loss of efficiency due to dependence can 
be estimated as the ratio ?/&, where 9 and & denote the estimates of 
r and o- from the sample. Hence, if control or confidence limits are 
used to evaluate the channel performance, the actual 3 standard 
deviation or 2 standard deviation limits should be wider by 100 (t/<t) 
percent. 

We may also consider the loss of power for statistical tests for H : 
P =; Po of the form 

reject H if Z n ^ C*. 

Based on the assumption of independence, the power function is 
approximately 

ft = l-*(£^*) ; (20) 

\ \ftcr / 

whereas for our model, the power is approximately 
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POWER 




BIT ERROR RATE p 
Fig. 1 — Comparison of power functions. 

If the first-type error a ^ 5, we see from (20) that C* — np ^ 0. We 
see that /3/ ^ /3u if C* — np ^ and /3/ ^ /3 C otherwise. This means 
that it might be possible to design more powerful tests for H based 
on the knowledge that the dependent model obtains. On the other 
hand, the test is conservative in the sense that it may reject more 
channels than expected if the bit error rate p is close to the service 
objective po and if the dependent model obtains. The rules of the game 
shift in the other direction if C* — np < 0. However, it is the smaller 
values of p that we are really concerned with and we may claim that 
the test based on the assumption of independence gives a pessimistic 
estimate of channel reliability (see Fig. 1). 



V. POISSON APPROXIMATIONS 



The bit error rates of high-speed digital channels are usually very 
small, say 10~ 6 to 10~ 8 ; therefore, the normal approximation and the 
statistical theory discussed earlier may not be too helpful in practice 
unless n is large. In this section, we prove that Z n converges in distri- 



1312 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1973 

bution to a Poisson distribution if np — ► mo in a suitable way. Using 
this result, we construct a Poisson process in continuous time t that 
approximates the process [Z n (t):t > 0} where n denotes the number of 
transmitted digits per unit time. 

We have shown earlier in (2) that the error rate p is given by 

P = (1 -h)P/(p + P). (22) 

If p — > in such a way that np -> mo > 0, what do we expect to be the 

asymptotic distribution for Z n = *i H h s», the number of errors 

in the first n digits? Note that we have quite a few choices for the 
convergence np — * mo- For example, keeping p fixed and letting 
P = (p/n)' 1 , 1 - h = (p/n) f2 , ei + e 2 = 1, we have, by (8), 

Var (Z B ) «/xo(l - p) +2Cn 



1 - X 



where mo = m/P- Also, 

_£/?„ = np 

= Mo- 

Hence, if e 2 = is selected, we see that for large n, Var Z n j* EZ n 
so that the limiting distribution of Z n cannot be Poisson. 

In order that p = (1 - h)P/(p + P) « Mo/n, the most general 
choice of h and P would be 

1 — h = aix + a 2 x 2 + a 3 K 3 + • • ■ , ,~*\ 

P = b 1 y + b 2 y 2 + b !i y 3 +---, 

where x = n - ", y = rr n , 6i + e 2 = 1, ei ^ 0, es > 0, and Oifei/p = mo 

(the case e 2 = is of particular interest and will be considered sepa- 
rately later). We state the main theorem of this section as follows: 

Theorem 5: If p — » in such a way that (24) holds, then 

P{_Z n = Q -> i. Mo*-" 

osn-+ oo , w^ere mo = aM/p. Furthermore, the convergence is uniform 
in i = 0, 1, 2, 

By using the result of Theorem 5, we may construct a Poisson 
process in continuous time t as an approximation to the process of 
partial sums \Z n : n ^ 1}. Suppose the underlying channel can 
transmit n digits per unit time. Let Z n (t) denote the number of errors 
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in (0, 0- Theorem 5 states that, for i - 0, 1, 2, ■ • •, 

here p denotes the limiting error rate per unit time. Let Z(t) denote 
the number of errors in (0, t) in the limiting case. The fact that Z(t) 
is a process with independent increments, namely that Z(t) is indeed 
a Poisson process, is easy to prove and we shall omit it. 

Theorem 5 implies that Z n is asymptotically a minimum sufficient 
statistic for the bit error rate p if (24) can be justified; this provides 
theoretical support for the use of Z n in any statistical inferences 
concerning p. We remark here that, by replacing p by p* and Z n by S n , 
the same comment applies for block error rates. Another consequence 
of Theorem 5 is that 

Var (Z n ) -> po = aibi/p. (25) 

Note that if X = 1 — p — P = (the independent case), (24) implies 
P — > and this in turn implies p — > 1. From (25), we see that Var (Z„) 
is minimized in the independent case. The increase of variance due to 
dependence is therefore 100 (1/p — 1) percent. Hence, in the dependent 
case, the confidence interval for p should be wider than we thought 
in the independent case. 

The null hypothesis H : p ^ p becomes H' : M o ^ mS in the limiting 
case. The uniformly most powerful test for Hq exists and is given by 
the rule : 

reject #o if Z n ^ C*. 

Based on the approximation that Z„ is Poisson, we may compute the 
power functions as 

Mho) = P[Z n ^ C*|p„] 
i=C* i ■ 



-/: 



e~ x x c '~ 1 dx 



(C* - 1) ! 
and 

It follows that 0i ^ Qd so that a test for H based on the assumption 
of independence and used when dependence is present rejects more 
channels than it should. In other words, tests designed for independent 
observations protect customers in the sense that channels they are 
using may have better quality than inferred. 
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The effect of dependence reported for both the binomial and the 
Poisson cases has an intuitive explanation. By using Z„ or S„, we are 
actually abandoning some of the information contained in the sequence 
zi, z 2 , • • •, so that statistical inferences based on Z n or S n tend to be 
more conservative in the sense that channel reliability is estimated 
pessimistically. 

We now return to (24) and consider the special case e 2 = 0. This 
case cannot be ignored because previous papers, for example Ref. 1, 
indicate that sometimes h ^ 0.5 (rather than 0.999) is a reasonable 
value. The fact that Poisson processes do not describe certain error 
processes well has also been reported in the literature. 

If e 2 = 0, eq. (24) reduces to 

P = [6i + o(l)]/n 6i > 0. (26) 

We have 

Theorem 6 : // (26) holds, then 

r? z r bl(1 ~ H) 1 

where H = (1 — h)u + h. 

We remark here that the limiting value in eq. (27) is the PGF of a 
compound Poisson process. More specifically, let N be a Poisson 
variable with mean 6 X /(1 - p), and let W h W 2 , ■■■ be iid random 
variables with the geometric distribution 

(1 -p)(l -h) 



(27) 



P[W i = i] = 



V 



r^xx-^ - (28) 



1 - (1 - p)h . 

i= 0, 1,2, ■••. 

If the W'a are independent of N, then the left-hand side of (27) is 
simply Eu Wl+Wt+ '" +Wlf . It is of course possible to introduce a con- 
tinuous time parameter t and consider the following random mecha- 
nism which describes the bursty nature of this error process vividly. 
The bursts are generated by a Poisson process; given that a burst 
occurs, the errors are generated by a geometric distribution. 

From the right-hand side of (27), it is possible to compute the 
moments of the limiting distribution of Z n . We have 

E(W l + W , + • ■ •+ W N ) = bl{l ~ k) (29) 



Var (Wi+ F 2 +---+ W N ) 



= bi(l - h) 2bi(l - hY(l - y) t (3Q) 



v v 2 
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Note that the variance is always larger than the mean in this case. 
Note also that, as h approaches 1, the second term on the right-hand 
side of eq. (30) is of higher order and vanishes in the limiting case. 
Another interesting thing is that it is possible to show that the right- 
hand side of (30) is minimized at p = 1, and as p approaches 1, the 
limiting distribution is Poisson. 

Branching renewal processes have been suggested in the literature 8 
as a model for series of events. The basic structure for branching 
renewal processes can be described in terms of our problem as follows : 
The series of primary events (bursts) are generated by a Poisson 
process. Each of these primary events generates a subsidiary series 
of events (bit errors), separated by the waiting time Y h Y 2 , • • •, Ys, 
where S is random. If we assume that these subsidiary series of events 
take no time, then the branching renewal process reduces to the com- 
pound Poisson process. 

VI. CONCLUSIONS AND FURTHER EXTENSIONS 

(i) The burst noise model of Gilbert discussed in this paper pro- 
vides a vehicle for studying the robustness of some fixed sample size 
statistical procedures. The general result is that the presence of 
dependence increases the variance of the random variable Z„, for 
the case where the bit error rate p is fixed and the case in which p 
= [mo + o(l)~]/n. Thus, use of statistical tests based on the assump- 
tion of independence increases the power at the cost of rejecting more 
satisfactory channels than would be rejected if dependence were 
absent. The use of blocks does reduce the covariances among errors 
compared with bits or smaller blocks. However, the covariance 
structure among the blocks is essentially the same as that among 
the bits. 

(«) Although the dependence structure of the Gilbert's burst noise 
model is a simple one, it is by no means a trivial one. In fact, from the 
insight gained through this study, many results obtained in this paper 
have generalizations in error processes defined over an s-state Markov 
chain as well. A unified treatment on channels with Markov type of 
memory will be reported elsewhere. 

(m) The second largest eigenvalue (in absolute value) of the 
(s X s) transition matrix of the underlying Markov chain is a param- 
eter which should not be overlooked. It can be viewed as a measure 
of dependence of a Markovian model. The effect of this parameter 
( = X in this work) is visible in many important formulas, for example, 
in (14). 
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(iv) Another important question to ask is what kind of stochastic 
process can be used to approximate the error process of a binary 
channel with memory. If the bit error rate is small, we can extend the 
proof of Theorem 6 (in a nontrivial way) to find an important con- 
clusion : the compound Poisson process can serve the purpose. 

(v) The by-products of this work are also fruitful. For example, 
the variance formula of Z n can be generalized to find the variance of 

T n = f(Xi) H h f(X n ) where {X,} is an s-state Markov chain, 

s ^ oo , and / is an arbitrary function. Since many continuous sampling 
plans, such as CSP1, CSP2, CSP3, can be described as random walks 
of the form T n (see Refs. 9 and 10), the application of this formula to 
quality assurance is evident. 

(vi) Mathematically speaking, there is an essential difference be- 
tween Gilbert's original treatment and our generalizations to the 
s-state Markov chain. More specifically, Gilbert viewed his problem 
as one of the renewal type whereas the s-state Markov case should 
be handled by the semigroup property (of taboo probabilities). We 
remark here that many results of the theory of recurrent events (see, 
for example, Ref. 11) can be applied to Gilbert's model. We also 
remark that the renewal process is a one-state semi-Markov process. 
A general question can be raised at this point : What is the behavior 
of an s-state semi-Markov channel? Since it is known that distri- 
butions other than the exponential (for example, the Pareto distri- 
bution, see Ref. 12) describe the waiting time distribution well, the 
question raised is a realistic one and should not be merely considered 
as an attempt at mathematical generality. 
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APPENDIX 

A.l Proof of Theorem 1 

Consider Y n = (X n , z„) as a three-state Markov chain with transi- 
tion matrix 

(G, 0) (B, 0) (B, 1) 

(G,0) \l-P hP (l~h)P 

(B,0) v Hl-p) (1-/0U-P) 

(B, 1) V Hl-p) (l-ft)(l-p)J 

say. We have 



- Q = (?<;), (3D 
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Eu Zn = £««i+'«+ •••+*» 

= Ei/o E^ Ei/j • • • Ei/„ «* 1+ " ' +2n 9i/ n _, ! / n 

'Qvn-2Vn-l ' * " 9j/0!/r^V0> (32) 

where X„ = P[7 = 2/0]. Note that the value of z ( is completely de- 
termined by the value Y ( = y ( . Let 

and let 

R = (nj). 
Relation (32) can then be written as 

Eu z " = XR»l, 



where 



(33) 



^ — (X(G,0), X(B,0), X(B,l))i 
1'= (1,1,1), 

R = (r«) 



= Q 



("1 





0] 





1 











u 



We remark here that eq. (33) can be extended to the case of an s-state 
Markov chain easily. Letting u = in eq. (33), the PGF of Z n , we have 



P[Z n = 0] = X' 




(34) 



The last column of the 3 X 3 matrix in eq. (34) is always a zero vector 
for every n fc 1. Hence, the right-hand side of (34) is essentially the 
nth power of a 2 X 2 matrix. The explicit formula for g n in eq. (4) 
follows from (34) by straightforward calculations. 

A. 2 Proof of Theorem 2 

The z's are conditionally independent if the values of the X's are 
given. Hence, 

PLZ n = 0] = E\P[ Zi = z, =•■•= Zn = 0\X 1} X 2 , •• •,*„]) 

/•; n p[ 2i = oix,-] 
1=1 



= E II h x <~>- 

i=l 

= Eh Xl+X2+ '" +x "~ n . 



(35) 
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Similarly, 

Eu z " = E\E[u^+--- + *»\X 1 ,X i , •••,I„]| 

= tfjn [a+ (i -m**- 1 

= EH Xl+Xi+ ■ ■ •+- y «-", (36) 

where H = h + (1 - h)u. By comparing eqs. (35) and (36), Theorem 
2 follows. 

A. 3 Proof of Theorem 8 
Byeq. (10), 

P[X n = 2|Z = 2] = 7T 2 + X'Vl 

Hence, 

Cov (**, tj) = P{zi = Zj = 1] - P 2 

= (i - hypiXi = Xj = 2] - p 2 

= (1 - /l)V 2 (7T2 + TlXl*-^) - P 2 

= titi(1 - JO 2 * 1 *"". QED. 

A. 4 Proof of Theorem 4 

Let us compute a special case first. Consider P[T\ = 0, T n = 0]. 
A typical path of the underlying Markov chain \X\, X2, • • •, Xkn] 
may be of the following form : 

b(xk, Xk+i, X(„-Dk, X(n-i)k+» 

= {XlXi- • -XkXk+1- • ■JC(n-l)*B(ll-l)*+l8(»-X)*+»' ' *^n*)« (3') 

|«- -| I- -I 

first block last block 

of size A- of size * 

The rest of the x's in b (and in W h W2 later) are omitted for typo- 
graphical reasons. Note that the values of Xk+t, •••,&(«-.« *_i are 
deliberately unspecified ; also, n > 2 is assumed pro tern. 

For fixed first block and last block, there are four different kinds of 
paths, according to the values of x k +i, a<n-i)*. Let m denote the number 
of 2's in the first and last blocks together. We have 

P[Ti = 0, T n = 0\b(x k , Xk+i, .T(n-i)*, X(n-i)JN-i)] = hm ( 38 ) 
and 

P[b(Xk, Xk+l, X(,n-l)k, .T(„-l)fc+l)] 

= TF 1 (.r fc )p It , i+1 p& ( 2 n ) _* ) 7 i:i Px (n _ 1)t x (n _ 1)A+1 ^2(a;(n-i) fc+ i), (39) 
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where 



and 



Wi(z k ) = P[X X = x h X 2 = xo, • • -, X ft _! = x k -i, X, = xkl 



Wz(x( n -i)k+i) = P[X (n _i)t + 2 = x (n -i)k+2, • • -,X nk = x nk 

\X( n -l) k +l = .r( n _i)A- + i]. 

We may also find P[T, = 0]-P[T n = 0] by considering their 
conditional probabilities over the first and the nth blocks. It is not 
difficult to see that 

P[7\ = 0]P[T n = 0] = L ^^(^)Tf 2 (.T (n _ lu+1 )-7r I(n _ 1)i+1 , (40) 

where the summation ranges over all 2 n possible blocks. The expression 
for P[7\ = 0, T n = 0] can be obtained by taking the product of 
(38) and (39) and summing over all 2 n+2 possibilities. The 2 U+2 terms 
in this form of P[T, = 0, T n = 0] outnumbered the terms in (40) 
by a margin of 4 to 1, and there is an obvious 4:1 correspondence 
between these terms. Consider 

Cov (Ti, T n ) = Cov (1 - T h l - T n ) 

= PET 1 ! = 0, T n = 0] - PIT, = 0]P[T n = 0]. (41) 

For fixed first and last blocks, a typical difference between the (4:1) 
correspondent terms is 

A" , [F 1 (x fc )pr 2 (.T (n _ lu+1 )][p IAl pE{ n - 2, *- 1 Vi, (n _ 1)t+1 

+ Px i2 #- 2) *- 1] p 2l(n _ I)4+1 - ir* ( „_ 1)4 _J. (42) 
By (10), it can be shown that the third factor of (42) becomes 

(Px k lPl X(n _ l)k+l + Px k lP2 X(n _ l)k+i + Px,2Plx (n _ 1)i4l + Px k iP2 X{h _ l)k+l )^ {n ~ i)k ~ 2 

p 



p + p 

p 
p + p 



^(n-2)k+l 
^(n-2)fc+l 



P \(n-2)fc+l 

P + P 

p + p 



if (x k , x (n - k)+1 ) = (1, 1) 
= d,2) 
= (2, 1) 
= (2, 2). 



(43) 



Note that in all terms we have a common factor X ( " 2 > fc+1 . By factoring 
out this common factor, Theorem 4 follows immediately. 
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We may even push the computations further to find an exact 
expression for the constant Ci in Theorem 4. Note that we have four 
types of combinations of blocks, according to the values of x* and 
.T(„_fc )+ i. The quantity in (42) becomes 

(1, 1) => hrWi(l)W»(l) -^rp x<*-»)*+s 

(l, 2) => - hrWi{l)Wi(2) — £-= \c«-«ft+», 

P (44) 

(2, 1) => - ^TTi(2)TT 2 (l) — ^-p X<"" 2 ^+i, 

(2, 2) => A«TTi(2)T7 2 (2) — J-p \<—»>*+i, 

and Cov (T h T„) is the sum of all 2 2 * terms in (44). 
Let 

T\ = Zi + Z 2 H h 2*_1, ^a = Sfc+2 H h 22fc. 

(We should use T^ = z (n _i )M -2 + • • • + z B *; however, the distribution 
of T' n is independent of n so we may take n = 2.) Then 

p[r; = o\x k = i]p[r' 2 = o|x fc+1 = i] 

= V /rP[Xi = xi, • • •, Xn = x k -i\X k = 1] 

22(t-D 

■P[_Xk+2 — Xfc+2, • * ', Xik = X2*|Xfc+l = 1J 

= - £ *?-Fi(l)TFi(l) f 

where m denotes the number of 2's in the sequence Xi, x%, • • • , x*_i, 
x*+2> • ■ ■ , X2ft, which is equal to the number of 2's in the sequence Xi, x 2 , 
• • • , X2t in the case Xk = Xk+i = 1. Thus, the sum of terms of the type 
(1, 1) in (44) is simply 

ir x P{T\ = 0\X k = 1]P[T 2 = 0\X k+1 = l].»,X<»-»*+i. 

Similarly, we may find the sums of other types of terms in (44). 
We have 

(1, 2) => - xxAPCri = 0|X fc = l]P[r 2 = 0|Z fc+ i = 2> 2 X ( "- 2)fc+1 
(2, 1) => - T 2 hP[_T\ = 0\X k = 2]P[T 2 = 0|X ft+1 = l]-7TiX("- 2)fc+1 
(2, 2) => 7r 2 /i 2 P[r' 1 = 0\X k = 2]P[T 2 - 0|X* +1 = 2> 1 X<»- 2 >*+ 1 . 
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Thus, if n > 2, i = 1, 
Cov (Tx, T n ) 

= COV (7\-, Ti+n-i) 

- iriT«x«»-«)w{^C^i - oi** = i]P[r 2 = 0\X k+1 = 1] 
- / ? p[r; = oiz* = \^p[t', = o\x k+1 = 2] 

- hPlT\ = 0\X k = 2]P[f, = 0|AVi = 1] 

+ A 2 P[Ti = 0\X k = 2]P[T 2 = 0|Z H i = 2]}. (45) 

The case n = 2 should be considered separately; this is because 
(n - 2)fc - 1 < if n = 2 so that (39) simplifies to 

Plb(x k ,x k+1 )2 = W 1 (x k )p XkXk+1 W 2 (x k+1 ). (46) 

In this case, the number of terms in P[T\ = 0, T 2 = 0] equals the 
number of terms in the product P[T\ = 0]P[T2 = 0] and there is 
an obvious one-to-one correspondence between the terms. Consider the 
difference P[T X « 0, T 2 = 0] - P[Ti = 0] P[T 2 = 0]. For fixed 
first and last (second) blocks, the term-wise difference is 

/r^ 1 (o; fc )T7 2 (.x-, +1 )[px i x i+l - r Xi+1 ]. (47) 

The last factor in (47) can be computed. We have 

p 

7Tx..,, = : — ?;X if (x k , X k +i) = (1, 1) 

X =(1,2) 

X = (2, 1) 

; , __ p X = (2, 2). (48) 

Note that (43) reduces to (4S) if n = 2 ; hence, all arguments leading 
to (45) hold true even if n = 2. 

Let Ci be the quantity in the large square bracket of (45) ; we may 
write (45) as 

Cov (Ti, Tj) = C 2 TiT 2 X<" f -'l- 1 >*+ 1 (49) 

if i ^ j. 

It is possible to find the value of Cs through an argument similar to 
that of finding g„ in Theorem 1. However, we shall be satisfied with a 
crude estimate 

| 7Ti7T2C'2 |=1, 

which follows from 7Ti7T2 = 7ri(l — 7ri) ;= | trivially. 



V 


+ p 




p 


V 


+ p 




V 


V 


+ p 




V 
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A. 5 Proof of Theorem 5 

Let H = (1 — h)u + h and let d h d 2 , Ai, A 2 , p be the quantities 
obtained from a h <x 2 , A h A 2 , p by replacing each h with H respectively. 
As n — >°°, H — » 1. Hence, 

«x-^ 1 

6i% -> 1 - V < 1 

i 2 ->o 

p-»0. 
It follows from Theorem 2 that 

lim£u*» = limf^V- (50) 

Let A 2 = [1 — P — H(l — p)] 2 + 4:pPH in the expression of Ai. 
An important step in our argument is to find the value of A. By 
substituting (24) into the expression for A 2 , we have 

A 2 = p 2 + £ y n x n -f f) 8 n y n + txy + o(xy), (51) 

» = 1 » ■» 1 

where 

72* = (1 - p) 2 (l - w) 2 (a« + 2aia 2 n-i + 2a 2 a2n-2 H 

-f 2a„_ia„+i) + 2p(l - p)(l - u)a 2 „ 

72n+i = (1 - 7>) 2 (1 - u) 2 (2aia 2n + 2a 2 a 2n -i 4 h 2a„a n+ i) 

5 2n = &2 + 26 1 b 2n -i + 26 2 fe 2n _ 2 H + 26„_ 1 6„ + i 

8 2n+ i = 26i6 2 „ + 2b 2 b 2n _i H 1- 2b„6„ + i 

e = — 2(1 — p)(l — w)aibi - 4p(l - w)a 1 6i. 

Let 

A = V + E (^n.T n + e n y n ) + A-y + o(zy). (52) 

n = i 

By comparing the A 2 in (52) with the same quantity in (51), it is not 
difficult to see that 

d k = (1 - p)(l - u)a k , gg . 

ek = b k 

for /g = 1,2, ■ • • . Also, it is easy to find that 

/=_?(!- U )aj^. (54) 
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Using (53), it can be seen that in the expression of &\, the coefficients 
of x k , y k are zero for all k. Hence, recall that xy = 1/n, 

. (1 — w)ai6i , . 

&i = 1 - — « a^ + o(xy) 



_ (1 - u)ai& x 



•(;)• <55) 

By (55) and (24), it can be seen that 

j,. «*»Q-") + ,(iy ( 5 6) 

pn \ n ) 

By (50), (55), and (56), we have 

which is the PGF of the Poisson distribution with mean equal to 
aibi/p. 

A. 6 Proof of Theorem 6 

Using (26), we may express &i, <$i in terms of powers of 1/n as 






(57) 



where 

6i(l - #) 



1 - H + pH 
H = (1 - /Ou + ft. 

By eqs. (50), (57), and (58), we have 

lim £u z » = lim = Al . lim <*i n+1 

n-.« /i — » 1 — tSi n — « 

= exp [—a] 



(58) 



ff+pH . 

This proves Theorem 6. 






(59) 
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